AITopics

Country:

North America > Canada (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

Neural Information Processing SystemsFeb-8-2026, 13:14:50 GMT

4b55df75e2e804bab559aa885be40310-Paper.pdf

consumer, dataset, snd, (14 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Neural Information Processing SystemsFeb-8-2026, 11:44:39 GMT

Learning Certified Individually Fair Representations

Fair representation learning provides an effective way of enforcing fairness constraints without compromising utility for downstream users.

artificial intelligence, machine learning, representation, (16 more...)

Country:

North America > United States > California (0.04)
North America > Canada (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Neural Information Processing SystemsOct-2-2025, 23:12:29 GMT

Learning Certified Individually Fair Representations

Fair representation learning provides an effective way of enforcing fairness constraints without compromising utility for downstream users.

artificial intelligence, machine learning, representation, (16 more...)

Country: North America > United States (0.28)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Butte, Vijay Kumar, Butte, Sujata

Secure, Scalable and Privacy Aware Data Strategy in Cloud

arXiv.org Artificial IntelligenceSep-18-2025

The enterprises today are faced with the tough challenge of processing, storing large amounts of data in a secure, scalable manner and enabling decision makers to make quick, informed data driven decisions. This paper addresses this challenge and develops an effective enterprise data strategy in the cloud. Various components of an effective data strategy are discussed and architectures addressing security, scalability and privacy aspects are provided.

data mining, data strategy, machine learning, (18 more...)

doi: 10.1109/ICAISS55157.2022.10011063

2509.13627

Country: North America > United States > Idaho > Bonneville County > Idaho Falls (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceSep-30-2024

1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models

Park, Chanjun, Ha, Hyunsoo, Kim, Jihoo, Kim, Yungi, Kim, Dahyun, Lee, Sukyung, Yang, Seonghoon

In this paper, we propose the 1 Trillion Token Platform (1TT Platform), a novel framework designed to facilitate efficient data sharing with a transparent and equitable profit-sharing mechanism. The platform fosters collaboration between data contributors, who provide otherwise non-disclosed datasets, and a data consumer, who utilizes these datasets to enhance their own services. Data contributors are compensated in monetary terms, receiving a share of the revenue generated by the services of the data consumer. The data consumer is committed to sharing a portion of the revenue with contributors, according to predefined profit-sharing arrangements. By incorporating a transparent profit-sharing paradigm to incentivize large-scale data sharing, the 1TT Platform creates a collaborative environment to drive the advancement of NLP and LLM technologies.

arxiv preprint arxiv, contributor, data contributor, (10 more...)

2409.20149

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Fu, Silvery D., Chen, Xuewei

Compound Schema Registry

arXiv.org Artificial IntelligenceJun-17-2024

Schema evolution is critical in managing database systems to ensure compatibility across different data versions. A schema registry typically addresses the challenges of schema evolution in real-time data streaming by managing, validating, and ensuring schema compatibility. However, current schema registries struggle with complex syntactic alterations like field renaming or type changes, which often require significant manual intervention and can disrupt service. To enhance the flexibility of schema evolution, we propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes, supporting a broader range of syntactic modifications without interrupting data streams. Our approach includes developing a task-specific language, Schema Transformation Language (STL), to generate schema mappings as an intermediate representation (IR), simplifying the integration of schema changes across different data processing platforms. Initial results indicate that this approach can improve schema mapping accuracy and efficiency, demonstrating the potential of GSE in practical applications.

mapping, schema, schema evolution, (13 more...)

2406.11227

Genre: Research Report (0.50)

Industry: Information Technology (0.37)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Krchova, Ivona, Platzer, Michael, Tiwald, Paul

Strong statistical parity through fair synthetic data

arXiv.org Machine LearningNov-6-2023

AI-generated synthetic data, in addition to protecting the privacy of original data sets, allows users and data consumers to tailor data to their needs. This paper explores the creation of synthetic data that embodies Fairness by Design, focusing on the statistical parity fairness definition. By equalizing the learned target probability distributions of the synthetic data generator across sensitive attributes, a downstream model trained on such synthetic data provides fair predictions across all thresholds, that is, strong fair predictions even when inferring from biased, original data. This fairness adjustment can be either directly integrated into the sampling process of a synthetic generator or added as a post-processing step. The flexibility allows data consumers to create fair synthetic data and fine-tune the trade-off between accuracy and fairness without any previous assumptions on the data or re-training the synthetic data generator.

artificial intelligence, machine learning, synthetic data, (13 more...)

arXiv.org Machine Learning

2311.03

Country: Europe > Netherlands (0.04)

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Zhao, Boxin, Lyu, Boxiang, Fernandez, Raul Castro, Kolar, Mladen

Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm

arXiv.org Artificial IntelligenceJun-4-2023

High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensates data providers according to their data contribution (revenue allocation problem). For example, a bank could pay the data market to access data from other financial institutions to train a fraud detection model. Compensating data contributors requires understanding data's contribution to the model; recent efforts to solve this revenue allocation problem based on the Shapley value are inefficient to lead to practical data markets. In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time. The new algorithm employs an adaptive sampling process that selects data from those providers who are contributing the most to the model. Better data means that the algorithm accesses those providers more often, and more frequent accesses corresponds to higher compensation. Furthermore, the algorithm can be deployed in both centralized and federated scenarios, boosting its applicability. We provide theoretical guarantees for the algorithm that show the budget is used efficiently and the properties of revenue allocation are similar to Shapley's. Finally, we conduct an empirical evaluation to show the performance of the algorithm in practical scenarios and when compared to other baselines. Overall, we believe that the new algorithm paves the way for the implementation of practical data markets.

artificial intelligence, machine learning, provider, (13 more...)